Deep Reinforcement Learning with POMDPs
نویسنده
چکیده
Recent work has shown that Deep Q-Networks (DQNs) are capable of learning human-level control policies on a variety of different Atari 2600 games [1]. Other work has looked at treating the Atari problem as a partially observable Markov decision process (POMDP) by adding imperfect state information through image flickering [2]. However, these approaches leverage a convolutional network structure [3] for the DQN, and require the state space to be represented as a two dimensional grid. This approach works well on problems where the state space is naturally two dimensional (i.e. Atari screen), but does not generalize to other problems. This project aims to extend DQNs to reinforcement learning with POMDPs without the limitations of a two dimensional state-space structure. In this project we develop a novel approach to solving POMDPs that can learn policies from a model based representation by using a DQN to map POMDP beliefs to an optimal action. We also introduce a reinforcement learning approach for POMDPs that maps an action-observation history to an optimal action using a DQN.
منابع مشابه
Feature Reinforcement Learning using Looping Suffix Trees
There has recently been much interest in history-based methods using suffix trees to solve POMDPs. However, these suffix trees cannot efficiently represent environments that have long-term dependencies. We extend the recently introduced CTΦMDP algorithm to the space of looping suffix trees which have previously only been used in solving deterministic POMDPs. The resulting algorithm replicates r...
متن کاملSolving Deep Memory POMDPs with Recurrent Policy Gradients
This paper presents Recurrent Policy Gradients, a modelfree reinforcement learning (RL) method creating limited-memory stochastic policies for partially observable Markov decision problems (POMDPs) that require long-term memories of past observations. The approach involves approximating a policy gradient for a Recurrent Neural Network (RNN) by backpropagating return-weighted characteristic elig...
متن کاملExploration in POMDPs
In recent work, Bayesian methods for exploration in Markov decision processes (MDPs) and for solving known partially-observable Markov decision processes (POMDPs) have been proposed. In this paper we review the similarities and differences between those two domains and propose methods to deal with them simultaneously. This enables us to attack the Bayes-optimal reinforcement learning problem in...
متن کاملOn Improving Deep Reinforcement Learning for POMDPs
Deep Reinforcement Learning (RL) recently emerged as one of the most competitive approaches for learning in sequential decision making problems with fully observable environments, e.g., computer Go. However, very little work has been done in deep RL to handle partially observable environments. We propose a new architecture called Action-specific Deep Recurrent Q-Network (ADRQN) to enhance learn...
متن کاملCoordinated Multi-Agent Reinforcement Learning in Networked Distributed POMDPs
In many multi-agent applications such as distributed sensor nets, a network of agents act collaboratively under uncertainty and local interactions. Networked Distributed POMDP (ND-POMDP) provides a framework to model such cooperative multi-agent decision making. Existing work on ND-POMDPs has focused on offline techniques that require accurate models, which are usually costly to obtain in pract...
متن کامل